Corpus Annotation By Generation

نویسندگان

Elke Teich

John A. Bateman

Richard Eckart

چکیده

As the interest in annotated corpora is spreading, there is increasing concern with using existing language technology for corpus processing. In this paper we explore the idea of using natural language generation systems for corpus annotation. Resources for generation systems often focus on areas of linguistic variability that are under-represented in analysis-directed approaches. Therefore, making use of generation resources promises some significant extensions in the kinds of annotation information that can be captured. We focus here on exploring the use of the KPML (Komet-Penman MultiLingual) generation system for corpus annotation. We describe the kinds of linguistic information covered in KPML and show the steps involved in creating a standard XML corpus representation from KPML’s gener-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using an Annotated Corpus As a Knowledge Source For Language Generation

We present an approach to building a realization component for an NLG system that makes use of an annotated corpus as the source of linguistic knowledge. Three issues that we focus on include: modeling of the generation decisions, corpus annotation and specification of the system architecture.

متن کامل

Automatic prominence annotation of a German speech synthesis corpus: towards prominence-based prosody generation for unit selection synthesis

This paper describes work directed towards the development of a syllable prominence-based prosody generation functionality for a German unit selection speech synthesis system. A general concept for syllable prominence-based prosody generation in unit selection synthesis is proposed. As a first step towards its implementation, an automated syllable prominence annotation procedure based on acoust...

متن کامل

Gross-grained RST through XML Metadata for Multilingual Document Generation

We present an RST-based discourse annotation proposal used in the construction of a trial multilingual XML-tagged corpus of teaching material in Basque, English and Spanish. The corpus feeds an experimental multilingual document generation system for the web. The main contributions of this paper are an implementation of RST through XML metadata and the adoption of gross-grained RST to avoid non...

متن کامل

Implementation and Optimization of Annotation and Interpretation Step of Next-Generation Sequencing Data for Non-Syndromic Autosomal Recessive Hearing Loss

Introduction: The precision and time required for analysis of data in next-generation sequencing (NGS) depends on many factors including the tools utilized for alignment, variant calling, annotation and filtering of variants, personnel expertise in data analysis and interpretation, and computational capacity of the lab and its optimization is a challenging task. Method: An application software...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Corpus Annotation By Generation

نویسندگان

چکیده

منابع مشابه

Using an Annotated Corpus As a Knowledge Source For Language Generation

Automatic prominence annotation of a German speech synthesis corpus: towards prominence-based prosody generation for unit selection synthesis

Gross-grained RST through XML Metadata for Multilingual Document Generation

Implementation and Optimization of Annotation and Interpretation Step of Next-Generation Sequencing Data for Non-Syndromic Autosomal Recessive Hearing Loss

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

عنوان ژورنال:

اشتراک گذاری